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ABSTBACT 



Markcv chains with large transition probability matrices 
occur in many applications such as manpower models. Under 
certain conditions the state space of a stationary discrete 
parameter finite Markov chain may be partitioned into 
subsets, each of which may be treated as a single state of a 
smaller chain that retains the Markov property. Such a chain 
is said to be "lumpable" and the resulting lumped chain is a 
special case of more general functions of Markov chains. 

There are several reasons why one might wish to lump. 
First, there may be analytical benefits, including relative 
simplicity of the reduced model and development cf a new 
model which inherits known or assumed strong properties of 
the original model (the Markov property) . Second, there may 
be statistical benefits, such as increased robustness of the 
smaller chain as well as improved estimates of transition 
probabilities. Finally, the identification of lumps may 
provide new insights about the process under investigation. 

However, a problem that arises in connection with prac- 
tical applications cf Markov chain models is to determine 
whether the chain is lumpable. This is especially difficult 
when the matrix P = {p L j} of transition probabilities is 
estimated from transition data. In this case, it is desir- 
able to find bounds cc A, the largest error, pb - p-. , in 

J J 

estimating p- , for all i and j. 

This thesis examines the sensitivity of the lumping 
conditions based on "P, the estimate of P. In general, it is 
found that the classical lumping conditions are extremely 
sensitive to the estimation error which can be expected to 
occur even with large data r^ts. Thus, these conditions may 
be of limited value in many ctuai applications. 
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I. INTRODUCTION 



Markov chains with large transition probability matrices 
occur in many applications such as manpower models. Under 
certain conditions the state space of a stationary discrete 
parameter finite Markov chain may be partitioned into 
subsets, each of which may be treated as a single state of a 
smaller chain that retains the Markov property. Such a chain 
is said to be "lumpable" and the resulting lumped chain is a 
special case of more general functions of Markov chains. 

Consider a Markov chain {X:t = 0,1,2,...} with finite 
state space S = {1,2,...,n}, stationary transition prob- 
ability matrix P = {p^} , and a priori distribution of 

"initial states", p° = (P,° / $ • • • / P n °) * bet S denote a 

nontrivial partition of S into m < n "lumps", say 
S* = {I ( 1) ,1 (2) , . . . , L (m) } . If {X^.} is lumpable with respect 
to s", denote by the lumped chain with state space T and 

transiticr probability matrix 

A well-known characterization [Ref. 2] is that {X^} is 
lumpable to [X^} if ar.d only if there exist matrices A and B 
such that 



BAPB = PB (1.1) 

where B consists of m nonzero orthogonal n-dimensional 
column vectors whose components are zeros or ones, and A is 
E’ with rows normalized to probability vectors (i.e, 
A = ( E ’ B T 1 B ' ) . The positions cf the I's in each column of 3 
correspond to states in S that together fora a lump ir. s". 
It follows that if BAEB = PB is satisfied, then "P = AP3 as 
is shewn in Chapter 2. 
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Many of the mathematical quantities associated with (X t ) 
can fce transformed directly to corresponding quantities for 
{X^} , using the lumping matrix B. In Chapter 2, for example, 
we show that if an original Markov chain {X^} is lumpable to 
{X^} and {X^} is further lumpable to {X^}, then {X^.} is 
directly lumpable to {X^_} , and we give the lumping matrix 
for {X^} in terms of the underlying two lumpings. 

There are several reasons why one might wish to lump 
[Ref. 1]. First, there may be analytical benefits, 
including relative simplicity of the reduced model and 
development of a new model which inherits known or assumed 
strong properties of the original model (the Markov prop- 
erty) . Second, there may be statistical benefits, such as 
increased robustness of the smaller chain as well as 
improved estimates cf transition probabilities. Finally, 
the identification of lumps may provide new insights about 
the process under investigation. 

However, a problem that arises in connection with prac- 
tical applications of Markov chain models is to determine 
whether the chain is lumpable. For chains with large state 
spaces S, it is practically impossible to use an exhaustive 
search to determine whether lumpability conditions such as 
those given in equation (1.1) are met for some matrices B, 
because cf the large number of ways partitioning S, i.e, the 
large number of candidate B matrices. For example, if S has 
10 elements, there are 115,975 partitions of S. 

Another problem is to estimate the matrix 'P = {p._.} of 
transition probabilities and to find bounds on A, the 
largest error of p L j for all i and j. We shall investi- 
gate the sensitivity of the lumping conditions in equation 
(1.1) for varying A. If {X^} is lumpable with lumping 
matrix B, is condition (1.1) satisfied with P replaced by 
the estimate "P? 
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This thesis will attempt to examine the sensitivity or 
the lumping conditions based on reasonable estimation errors 
when P is not known and must use estimated by 'p'. We 
describe these facts about lumpability using eigenvalues and 
eigenvectors, including the theorem mentioned by D.R .Barr 
and M.O. Thomas [Ref. 3]. We do not review elementary 
concepts of Markov chains here; the reader may wish to 
consult [Ref. 2] and [Ref. 4] for review of basic facts and 
specific terminologies such as lumpability, regular Markov 
chain, etc. 
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II. T HEO RY OF I OMP ABILITY 



This chapter will cover general facts about lumping such 
as, conditions for lumping, the number of partitions 
possible for any given size of state space S, and theorems 
associated with eigenvector conditions for Markov chain 
lumpability . 

A. CONDITIONS FOR LOBPING 

Consider a Markcv chain {X:t = 0,1,2,...} with finite 
state space S = £l,2,...,n}, stationary transition prob- 
ability matrix P = {p^}, and a priori distribution of 

"initial states", p° = (p,° /p a ° /• - • / P n °) • Let 's denote a 

nontrivial partition of S into m < n "lumps", that is 

s' = {1(1), L (2 ) , ..., L (m) } . If {X^} is lumpable with 

respect to 's, denote by {X^} the lumped chain with state 
space S* and transition probability matrix T. 

We now show that if the condition (1.1) for lumpability 
with respect to the lumping matrix B, 



BAPB = PE (2.1) 

is satisfied, then the lumped transition matrix T is given 
by 



P = APB 



( 2 . 2 ) 



Proof, p.. is the sum 
subset containing j € S 



Y P. K / where L (j) is the partition 

kt L( ,l *’ 

and i is any element of L (i) . By 



the lumpabibility condition, this value is the same for any 
i G I (i) . But the product PB sums the columns of P in accor- 
dance with the partition subsets indicated by the columns of 
E. Hence, PB is an n x m matrix with rows repeated in accor- 
dance with the partition sets 1(1), 1(2), ... , L (m) ; the 
effect of pre-multiplying by A = (B , B)" l 3’ is to "average" 
these common rows yielding an m x m matrix T without the 
repeated rows. But such "averages" are just the common rows 
being averaged. Hence, ? = APB is the m x m transition 
matrix of the lumped chain with state space 
(1(1) ,1 (2) ,... , L (m)) . 



Example 1. Consider a transition probability matrix P 
with 4 states which can be partitioned into 
S' = { {1} , {2,3} , {4}} = (1(1) ,1(2) ,1(3)}. Let 



Ihen 



P = 



B = 



•1/4 

0 

0 

.7/8 



0 C 

1 0 
1 c 

0 11 



We know equation 



APB = F = 



1/16 


3/16 


1/2' 


1/12 


1/12 


5/6 


1/12 


1/12 


5/6 


1/32 


3/32 


0 . 






'1 0 


and 


A = 


• 

o 

o 






.0 0 


) is 


sati sf ied 


hus, the lumped 


*0.25 


0.25 


0 


0. 167 


.0. 875 


0.125 



with partitioning 



0.5 

0.833 

0 



Many of the mathematical quantities associated with {X^} 
can be transformed directly to corresponding quantities for 
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since 



{X t } , using A and B cf equation (2.1). For example, 

AB is the m-dimensioral identity matrix, it follows that for 
s a positive integer. 



(?) = (APBf = A (PE) A (PB) . . . A (PB) = A? S 3 (2.3) 

We now show that (pf = AP S B 

(?) = (APB) (APB) (APB) . . . (APB) 

= AF (B APB) (APB) . . . (APB) 

= AP (PB) (APE) . . . (APB) 

= AP 2 (3APB) . . . (AP3) 

= AP 2 ?B. . . (APB) 

= AP S B . 

Eut AF S B = P s , since 

BAPB = PE 
EBAPB = P 2 B 
B AP BAPB = P 2 B 
E AP 2 B = P 2 B 



EAP*B = ? S E , 

so P s is lumpable with the same matrix B and P s = AP^B. 
This implies in turn that if {X t } has steady state distribu- 
tion IT, then {X^} has steady state distribution IT = 1TB. 

Theorem 1. The steady state distribution IT of the 
lumped chain {X^} is TB where IT = IT?. 

Proof. 1TB = ( TTP) B 

= 1T3 (APB) 

= (TTB)P' 

Therefore, TT = TTB. 
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Similarly, the a priori distribution p° of the initial 
state of the lumped chain corresponding to that of the orig- 
inal chain p° , is given by po = p°B, since by equations 
(2. 1) and (2.3) , 

pop* b = pop ... PB 

= pop ... P3APB 

= po? . . . P3P* 



= pOBP*. 



Note that p°P B is 
pied ty the lumped 
equals p°EP 3 = p°P s , 



the distribution of lumped states cccu- 
chain after s transitions. Since this 
it follows that po = pOB. 



B. PAETITIONS OF A SET OF STATES 

The matrix B consists of m nonzero orthogonal 
n-dimensional column vectors whose components are zeros and 
ones which determine a specific partition of 
S = {1,2,. ..,n). Example 1 illustrates this, where the 

state space S = {1,2, 3, 4} is partitioned into 

S' = {{1}, [2,3}, {4}) = £1(1) ,l (2) , L (3) } , and 



B 




0 0 * 

1 0 

1 0 

0 11 . 



Permutations of these columns give a matrix which also 
lumps {X^} . In order to see this, let B*" be B with cclumns 
permuted in some order. Then B* = B'l* - , where I is the iden- 
tity matrix with its columns permuted in the same order. Now 
if B A PB = PB, then 
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= B-I* ( I#B 1 B I*)-» I*B ’ PBI*" 

= B-I*(I*T l (B'Br 1 (I*)-» I*B'PBI* 

= B (B* E<) B'PBI* 

= BAPBI* 

= PBI* 

= PB* , 

so it fellows that {X^} is also lumpable with respect to the 
matrix 3*. 

Now, how many candidate lumping matrices are there? This 
would be the number of partitions of S. [Ref. 5] gives a 
recursion relation for the number of ways of partitioning 
a set S = {1,2,. ..,N}: 



N-l 




(N > 1 , A 0 = 1) (2.4) 



from this relation we find A,= 1, = 2, Aj= 5, A^= 15, etc. 
The sizes of the entries in Table 1 show that it would be 
impossible to use a trial and error approach to finding 
lumping matrices B for lumping a chain with larger state 
spaces, say with 10 or more elements. Values of A^ for 
larger N are shown in Table 1. 
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TABLE 1 










Partitions 


of a Set of 


N States 






N 


Partitions 


N 


Parti tions 




5 


52 


20 


5. 172415 


X 


10 13 


6 


203 


30 


8.467490145 


X 


1023 


7 


877 


40 


1. 574505884 


X 


1035 


8 


4140 


50 


1.857242688 


X 


10* 7 


9 


21147 


60 


9.769393075 


X 


1059 


10 


1 15975 


70 


1.80750039 


X 


1 0 7 3 



It is of interest to be able to systematically prescribe 
alternative lumpings by generating matrices B for a given 
transition matrix P, using some method other than trial and 
error. In the next section, we describe an approach to 
finding E matrices using the eigenvalues and eigenvectors of 
P. 

C. AN EIGENVECTOR CONDITION FOE MARKOV CHAIN LOHPABIIITY 

Many problems in science and mathematics deal with a 
linear operator T : V — >V, and it is of importance to deter- 
mine these scalars for which the equation Tx = Ax has 
nonzero solutions x. In this section we discuss this 
problem and its relationship with finding matrices B. 

Theorem 2. The value 1 is always an eigenvalue for any 
Markov chain transition probability matrix. 

Proof. Let P be any n x n transition probability matrix 
of [X + ] , x be a left eigenvector in R 1 ' , and A be the corre- 
sponding eigenvalue ci P. Then xP = xA which is equivalent 
to 
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X (P - AI) 



0 



(2.5) 



For A to be an eigenvalue, there must be a nonzero solution 
x of equation (2.5). Equation (2.5) will have a nonzero 
solution if and only if 



det (P - AI) = 0 . 



( 2 . 6 ) 



This is called the characteristic equation. To show that 
A = 1 always satisfies equation (2.6), we need only show 
that the columns of the matrix in equation (2.6) are 
linearly dependent. Note that 




(2.7) 

Lp p p -i J 

n 

Since X F:: = 1 for Markov chains, it follows that the rows 
j=i J 

in equation (2.7) sum to zero, so the determinant in equa- 
tion (2.6) is zero with A = 1 . It follows that A = 1 is an 
eigenvalue of the Markov chain {X^} . We’d next like to see 
properties of eigenvectors corresponding to the eigenvalue 

A = i. 



Theorem 3. For any regular Markov chain, components of 
the eigenvector corresponding to A = 1 are proportional to 
the steady state distribution of {X^} . 
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Eroof. Let x be a left eigenvector of P, and A be the 
corresponding eigenvalue of P, such that xP = xA t and 
assume }_. *{ = 1. For given ^ = 1, xP = x. The steady state 

IF l 

distribution of {X^} is unicue [Ref. 4]. Therefore, x must 
be the steady state distribution Tf since V x- = 1. 

The following example demonstrates Theorem 3. 

Example 2. let 





■ 1/4 


1/16 


3/16 


1/2 


p = 


0 


1/12 


1/12 


5/6 




0 


1/12 


1/12 


5/6 




. 7/8 


1/32 


3/32 


0 



The eigenvectors corresponding to the eigenvalues of P are 
displayed as column vectors below: 



Eigenvalues 



Eigenvectors 



1 0 

' C. 7367 0 

0.09209 -0.7201 

C.2236 0.7201 

.0.6315 0 



-0.25 -0.3333 

10.5 0.8247 " 

-0.375 -0.03436 

-4.125 -0.2405 

-6 -0.5498 J . 



Note that IT = ( IT, / Tv , TT* , TT+ ) 

= (0. 4375, 0. 0547, 0.1328, 0.375), 



where 






o.q36q 



etc. 



Theorem 4. Eigenvectors corresponding to eigenvalues 
ether than 1 are orthogonal to e = (1,1,...,1). 

Proof. xe’ = x(Pe') = (xP) e’ = xAe*. Therefore, xe’ 
must he zero for A * 

We are also interested in finding the relationship 
between eigenvalues of P and those of lumped transition 
probability matrix P, where 'p = APB as described above. 
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Theorem 5. Suppose {X^.} with transition matrix F is 
lumpable to £X^} with transition matrix T. The eigenvalues 
of p" are eigenvalues cf P. 

Proof. Let ft (A) = 0 be the ( n' ^ degree) characteristic 
equation of P. By the Cayley - Hamilton theorem £Ref. 6], 

<X(f> = QnP n+ . . . + o,p ♦ Q>i = o , 

which together with equation (2.3) implies 

AOJP) B = GnP* ♦ a M P"‘+ . . . + 0.1 
= a ( P) 

= o . 

Since P* satisfies P's characteristic equation and since 
eigenvalues of 0(P) are of the form fl(A) , it follows that 
OKA ) = 0. Thus all eigenvalues A of E> are also eigenva- 
lues cf P. 

We next examine the eigenvectors of P and P, with the 
aim cf identifying lumpings of £X^_) directly in terms of the 
eigenvectors of P. We have seen that po is obtained directly 
as p°E; a similar relationship holds with eigenvectors cf P. 

Theorem 6. Suppose x is a left eigenvector of P corre- 
sponding to eigenvalue \ , and suppose {X^} is lumpable to a 
chain with transition matrix T = APB. Then x3 satisfies the 
equation (xB)T = (xB)A . 

Proof. 3y equation (2.1), (xB)T = xBAPB = xPE. cut 

x? = xA / and the result follows. 

We note that xB is not necessarily an eigenvector of P 
because it may be zero. In fact, it easily follows that 
xB = 0 if A is not an eigenvalue of "P*. But xB may be null 
even if A is an eigenvalue of T, in cases of where A is a 
repeated eigenvalue cf P more times than of 
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[Ref. 7] pointed out some other useful properties asso- 
ciated with eigenvalues and eigenvectors such as : 1) if the 
matrix P is symmetric , then eigenvalues are real and eigen- 
vectors are different for repeated eigenvalues, 2) if the 
matrix is not symmetric, then the eigenvectors are the same 
for repeated eigenvalues. 

Theorem 7. If {X^} with transition matrix P is lumpalle 
to {X^} with transition matrix T, and {X^} is lumpable to 
{X^} with transition matrix T, then {X^} is directly 
lumpahle to (X^} where {X^} is the lumped chain of [X^) . 

Proof. Let [X_^] be lumpable to [X^} , and (X^) be 

lumpable tc {X^} by matrices B, and B^, where B, and B^. are 
lumping matrices in which the dimension n x m of B, is 
greater than that of B z . By equation (2.1), 'p = A, PE, and 

T = Aj.PBa,. Thus, 

f =A i PB i = A 4 <A, PE, ) B^ (A a A,) P (B,B 2 ) 

To see that BfB^ is a lumping matrix and A*Ai is cf the 
required form, we need to show that ( A ^A, ) • (B ,• B a ) is the 
identity matrix as mentioned in Section A. But 

(AiA,).(B ,-BJ = A a -(A,-B,)-B 4 = A^I-S*. = AyB, = I . 

Also, note that B,-B* is 3, lumped by B x , so 6,-Bj. has columns 
cf the required form. Therefore, £X^_} is directly lumpable 

to [T], by the lumpinq matrix B = 3,-B z . 
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Example 3. Consider a Markov chain with 5 states, and 
transition probability matrix 



— V 

o 

• 

u> 


0.1 


CM 

• 

O 


• 

o 


0.3' 


0. 1 


0.3 


0. 1 


0.3 


0.2 


0.5 


0. 1 


0 


0. 1 


0. 3 


0. 1 


0.5 


0.2 


0. 1 


0. 1 


-0. 5 


0 


0.1 


0.2 


1 

CM 

t 

O 



First, consider S = [1,2,3, 4, 5} which can be partitioned to 
s' = { {1} , {2,4} , {3,5} } = [L ( 1 ) , L ( 2) ,L (3) } . The corresponding 
lumping matrices are 



o 

o 

i 




0 1 0 




'1 0 0 0 O' 


0 0 1 


and Aj = 


0 0.5 0 0.5 0 


0 1 0 




.0 0 0. 5 0 0.5. 


1 

o 

O 





and the lumped transition probability matrix is 



P = A, EB, 



'0. 3 


CM 

• 

O 


0. 5 ' 


• 

o 


0. 6 


0.3 


.0. 5 


0.2 


O 

• 

u> 

1 — 



Secondly, consider S with 3 states which can be partitioned 
to If = {{1,3}, {2}} = [L* ( 1) , L* (2) } , with matrices 



3x = 



1 0 
0 1 
1 0 J 



and 



A „ = 



0.5 

0 



0 

1 



0. 5 
0 



] 



The corresponding lumped transition matrix is 



P = A a PB a 



’ 0.8 0.21 
0.4 0.6 J . 



Finally, consider lumping the transition probability matrix 
directly. For partitioning. 



2 1 



s = ((1, 3,5}, {2,4} } = [L» (1) ,L" (2) } , and 



■1 


0 - 




‘1 


0 


O' 


0 


1 




0 


1 


0 


1 


0 


= b .- b ^ 


0 


0 


1 


0 


1 




0 


1 


0 


.1 


0. 




u0 


0 


1 - 



■1 

0 


0 “ 
1 




-1/3 


0 


1/3 


0 


1/3 1 


.1 


0 - 




0 


1/2 


0 


1/2 


0 J 



and the directly lumped transition probability matrix is 



D s 



ro .3 0.21 

L 0 . 4 0.6J 



Theorem 7 shows that lumping is "transitive", in the 
following sense. Define two transition matrices P and Q to 
he equivalent, (P = Q) , if Q = T for a lumping matrix B 

whose columns are these of the identity matrix, in seme 
permuted order. (Thus the chain {X t } and {Y^} differ only in 
the labels associated with their states) . Define a relation 
" < " between transition matrices as follows: Q < P if and 

only if Q = P* for seme lumping matrix 3. Then theorem 7 
shows that Q < P, E < Q ^ E < P. This relation " < " is 
reflexive, since Q < Q using the lumping matrix I (iden- 
tity) . Finally, " < " is antisymmetric since Q < ? and 
F < Q ^ F = Q. Thus, the set if all transition probability 
matrices is partially ordered by the "lumping" partial 
order, " < ". 
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III. BOUNDS CN THE LARGEST ERROR, _A_, IN P 

In this chapter we consider three procedures to find 
hounds on A . First, we use the central limit theorem for 
given i and j. Secondly, we use a binomial approximation on 
the basis of the first procedure. Finally, we get the 
largest error A, using the asymptotic extreme value distri- 
bution. These three approximations are only designed to give 
a rough idea of the relationships between A and the number 

M of elements in the state space, the total number of 

observed transitions K.. , and the probability oC . 

A. APPROACH USING CENTRAL LIMIT THEOREM 

We are interested in the sizes of the errors between the 
estimate £ and the unknown P, where P is the transition 
probability matrix of {X^} . We assume the transition prob- 
ability matrix P is of size M x M. 

Let K^j be the number of observed transitions of {X t } 
from state i to state j, and let K^. be the number of 

observed transitions from state i. Similarly K.j is the 

number of observed transitions into state j. 

Let p- be an unknown transition probability from state i 

J 

to state j and p^. be an estimate of p- based on K.. observed 
transitions. Then the usual estimate p-. of p-- is the ratio 

J V 

of Ky to K^. . Now, as a rough approximation, imagine that 
K* t . is fixed, and the number of transitions from state i to 
state j, K-^j , is Binomial (K>. , p.j) . Then by the central 
limit theorem. 



23 



A 




is approximate by Normal[Pj. 



K- P>.j ( 1 ~ Pij ) . 
fKi.) x J 



since 




p.. , and 



Var[Pij ] = Var[— ^ — ] 



^ Var l j^jj 1 



<*i.) 






fcl-Pcj Cl - Pcj) 



(3.1) 



We want to rind a bound A on the estimation error |£..- p.. | 

— 1 ^ Lj y 

which occurs with probability at least &(_ ; that is, the 
largest A for which 

PlIFm" P-; I - A] - oC - 

V J 

Now 



— 3 • (3.2) 

M-ftjO-fo) 

V fK:.r 



let Z = — " — , then 

i Q~ P4 ) 

v (Vi.)*- 



pc i f r 



| > A] = I[ 



Py i & 



CKl.)" 
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Z is approximate by standard Normal. Rewrite equation (3.2) 
as 



f[|Z| > 



] * oC 






0 <«C < 1 



Equation (3.2) is approximately 



P[Z > 






oC_ 

2 . 



since the Normal distribution is symmetric. Solving for A, 
we have 



A < N-i ( 1 




I 




where N -1 (1 - — ) is the (1 - — ~ quantile of the stan- 

dard Normal distribution. Suppose the steady state prob- 
ability TTl of state i is -jq- based on the equally likely 
case, and suppose the worst case in which 




Then an approximate value for A 



is given by 
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A = N* (1 



(0.5) 



= N-‘{1 



= N-* (1 




1 





(0.5) 





(0-5) 




= N* (1 




(0.5) 




(3.3) 



Equation (3.3) concerns the error | ph - p.. | for fixed i 

J J 

and j. We'd now like to find an error bound A overall i 
and j. That is, we wish to find the largest A for which 

E[ I f-j" PljI £ A for some i and j] >o(., 

which is roughly the same as 



P[&- Pa * A for some i and j] > - — - (3.4) 

5 3 

We apply the binomial approximation in equation (3.4), so 
that 

P[ p'-j — p ; :> A for some i and j] 

J J 

= 1 - P[ p ; - - p ; - < A for all i and j] 
o J 



let 1 - ( 1 

gives 



= 1 - (1 - 



oC M 
X 




j* 

X 




for some 0 < 




(3.5) 

Solve for c< , which 




(3.6) 
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Substitute the value cf ©<. in equation (3.6) into equation 
(3.3). Finally we get the approximate bound A for all i and 

j: 








(3.7) 



Equation (3.7) gives an approximate expression for using 

binomial approximation. 

B. APPBCACH USIHG OBIEB STATISTICS 

Assume Z, , Z z , ... , Z,^ are independent continuous 
random variables, each with density function f z (z) and 
distribution F 2 (z) . Now let Z (b , Z C1) , ... , Z (M) denote 
their ordered values, from smallest Z cn to largest Z (H) ; 
these are called the order statistics of Z ] , Z a , ... , Z M . 
We now consider the probability law for Z [Eef. 8], the 
largest or maximum value. 

The event £Z (M) < z} occurs if and only if the event 
{Z , < z, Z x < z, ... , Z M < z} occurs, since if the largest 
Z is smaller than z, all M of the random variables must be 
smaller than z, where z is any fixed real number. The 
distribution function for Z fM) is 



since Z ,Z X ,...,Z M are assumed independent. But each of Z ( ,Z A 
,...,Z M has the same distribution F z (z) . So 



F_ (z) = P[Z < z] 

2(M) (M) 



= P[ Z, < z, Zj, < z, . . . , Z M < z ] 



= P[Z, < z] P[Z A < z] ... P[Z M < z] 



F (z) = [F 2 (z) ] M 
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The density function for Z 



then is 



f 3 ( z ) 



<1 Fi(r) 

where f_(z) = ---- 
2 6 z 



Rr F 4 Z) 

£F z (Z) ] M 

a CF z (z) 



Consider the limiting distribution function of the 
maximum Z (M) as n tends to infinity. [Ref. 9, 10] show this 
distribution is 

^ -yfxtyV \ ( z ) 

lim[F (z)] M = £ (3.8) 

M-9oo 2 

if Z,, Z A , ... , Z M is a random sample from standard Normal 

population. Re want find a bound A on the largest of M 2 
errors between estimates in ^ and the unknown components of 
P. The random variables p- - p.* are very roughly Normal with 
mean 0 and variance -r— — which is derived from equation 

(3.1) for i, j = 1,2,...,M. Recall that K.. is the total 
number of transitions observed. 

Let X^ = pjj where l = 1,2,...,i1 2 . Then we know the 

^ l / 

random variable X is approximately equal to -^-y- where 

Z has a standard Normal distribution. Let the random vari- 
able X (mJ j be equal to maxj p.j| . Then 



lim ^ A ] = lira P[max|p..- p. ; | < A ] 



SmAllest of (5y- - py ^ - A 






Urjes-t 4 ft. - py a 



j' 
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Now X cl) and X (Mi) are asymptotically independent, so for large 

M, 

u. A ] 

= { P[X, >- A ] ... P[X^>-A ] }-[F x (a) f* 

= [i - f x (- a ) : Mi [F y ( a) 



= [F x ( A) ]** • (3.9) 

From equation (3.9) we derive an expression for a as 
follows. Let A be the largest value for which 

p t>W A ] - ’ ' * • 



This is the complementary probability because we wish to 
have F[ | p..- p-I > A for some i and j ] > oC , as in the 
previous section. The limiting distribution function of the 
maximum X (Wl jis the same as 



!)<»*■ 

lim[F (a) ] 
M-wo 




Then, approximately. 



log (1 - oC ) 




and 
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log {-leg ( 1 -<*)}* -A/21og2M2 (2 -a/^Tm^) . 



Finally, 



A 




fy/21og2M ; 






( 3 . 10 ) 



Equation (3.10) is an approximate expression for A based on 
the asymptotic distritution of the extreme order statistic, 
fie will compare the central limit theorem A's with those 
obtained with the extreme value distribution, in the next 
section. 



C. C OMPABISON OF THE THREE EXPRESSIONS 

The three expressions for A obtained using the central 
limit theorem and order statistics have been developed under 
approximations such as: 1) the steady state distribution 
of (x*) is {equally likely) , 2) the variances of 
I 6. - p-. I have as a maximum value (worst case), and 3) 
all transitions are independent. Information about {X t } is 
from the estimate / P because we don’t have information about 
the unknown P. In a view of the above approximations and 
computations, our expressions for A are very rough. However 
they do provide some insight into the occurrences and sizes 
of estimation errors in 'P'. 

Figure 3.1 contains 3 graphs showing A as a function of 
K,, and M for fixed o(. = 0.90 based on the three expressions 
(3.3), (3.7) and (3. 10) . 

The first graph shows error bounds using the central limit 
theorem on ph- p<> for fixed i and j. The second graph is 
given by the same approach as the first graph, except 
overall estimation errors are considered, for all i and j. 
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overall estimation 0Y USINC OROER STATlST-CS 




Figure 3.1 Variation of A for Varying K. # and M. 



3 ] 



The third graph is based on the asymptotic distribution of 
the largest value of |p^- p- I over all i and j. 

J J 

From Figure 3.1 we see that the largest estimation error 
depends very much on the number of transition observations 
and matrix size, but not so much on the ^ value as seen 
from Figure 3.2 . 
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MAXIMUM VALUE OF A 



VARIATION CF A BY CHANC-NG ALPHA 



CNTRAIL ESTIMATION BY U5JNC C UT. OVERALL CTlUATiON BY US1NC OROCR STATISTICS 




Figure 3.2 Variation of ^ by Changing ilpha (<<) . 
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Graphs 2 and 3 in Figure 3.1 are very similar even though 
they use different approaches. They give an idea of how 
large likely values cf A are for given K. # and M, in the 
"worst case". 

If we consider a Markov chain [X^.} with M = 20 or 30 
states, and we have observed K = 5000 transitions then, 
roughly, it is likely (prob = 0.90) that at least one 
element cf £ is in error by at least 0.1. In general, 
expressions (3.7) and (3.10) may be useful for Markov chains 
{X^.} with M = 20 or 30 states and large numbers of observed 
transitions. 

D. SENSITIVITY OF LOIPING CONDITIONS 

We have developed expressions for A , using the central 
limit theorem and order statistics. We want to examine the 
sensitivity of the lumping conditions applied to 'P‘, the 
estimate of P. If equation (2.1), which is a necessary 
condition for lumping a Markov chain with transition matrix 
P, is satisfied, then the lumped transition matrix ^ is 
given by equation (2.2). However, even though P satisfies 
the lumping conditions, it is extremely unlikely that its 
estimate £ will also satisfy these conditions, as we shall 
now demonstrate. 

In order to simulate the difference between 'P' and P, 
consider a matrix of errors R-A, where R is a random matrix 
with dimension the same as P, whose components are 1 ’s, 
-1*s, and 0’s where the sum of each row is zero. Now 
consider the lumpability of the simulated estimate P*", which 
is constructed by taking P plus the random matrix R times 
A, that is, P*" = P ♦ R*A. 

To show the sensitivity of the lumping conditions, we 
assume the unknown P is lumpable with lumping matrix B, and 
consider the difference (EAP*B - P*B) . If equation (2.1) is 
satisfied by P then all of these components must be zero. 
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Theorem 8. The difference (BAP*B - P*B) is a linear 
function of A. 

Proof. let B be the random matrix as defined above and 
let P* = P + R-A. Then (BAP^B - P*B) is given by 

{BA (E + E-A ) E - (P+R'A ) E} = (BAPE + BAR- A- 3 - PB - R-A-E) 

= ( BAPB - PB + (BARB - RB)A ) 

= (BARB - RB)-A 
= c A • 

Therefore the difference of EAP*B - P*B is linearly depen- 
dent cn A and P* is not lumpable unless BARB = RB (i.e., R 
is ” lumpable”) , which is not likely to occur. 

Since 1? is likely to have elements differing appreciably 
from the corresponding elements in P (errors of size A), it 
can be seen that the lumpability conditions will not be 
satisfied (not even nearly so) by even though {X^} is 

lumpable. We conclude that attempting to check the lumpa- 
bility of the estimate 'p when P is not known is not useful. 
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IV. SUMMARY AND CONCLUSIONS 



Re have given several theorems associated with eigenva- 
lues and eigenvectors for lumpable Markov chains (X^} with 
finite state spaces. Re have derived rough, approximate 
mathematical expressions for the largest error made in esti- 
mating P by P based cn transition data. 

Eoth expressions (3.7) and (3.10) are very similar even 
though the estimated A ’s for the first expression are 
slightly less than those in the second expression. These 
expressions show that the largest estimation errors depend 
very much on the number of transition observations aid on 
the matrix size, but rot so much on the o< value. 

Since 'P is likely to have elements differing appreciably 
from the corresponding elements in P, it is of interest to 
examine whether the equation BAPB = PB is likely to be 
nearly satisfied with P, i.e., will (BAPB - ?3) be nearly 
zero? This is examined by simulation of "estimates" p* - of 
F, using random perturbations of elements of P of sizes A 
which are likely to occur as errors in 

This shows that the classical lumping conditions are 
extremely sensitive to estimation errors which car. be 
expected to occur even when a large number of transitions 
have been observed. Thus, the classical lumping conditions 
may be of limited value in many actual applications. 

As further research, it is recommended that seme 
constructive approach to finding matrices 3 for lumping a 
lumpable Markov chaiD £ X^. } be developed , perhaps along the 
lines of the theorems mentioned in Chapter 2. It is hoped 
that the present study will be useful to those who might 
otherwise have endeavored to check the classical condition 
for lumpability of a Markov chain when the transition 
matrix P has been estimated. 
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